Code Relatives: Detecting Similar Software Behavior
نویسندگان
چکیده
Detecting “similar code” is fundamental to many software engineering tasks. Current tools can help detect code with statically similar syntactic features (code clones). Unfortunately, some code fragments that behave alike without similar syntax may be missed. In this paper, we propose the term“code relatives”to refer to code with dynamically similar execution features. Code relatives can be used for such tasks as implementation-agnostic code search and classification of code with similar behavior for human understanding, which code clone detection cannot achieve. To detect code relatives, we present DyCLINK, which constructs an approximate runtime representation of code using a dynamic instruction graph. With our link analysis based subgraph matching algorithm, DyCLINK detects fine-grained code relatives efficiently. In our experiments, DyCLINK analyzed 290+ million prospective subgraph matches. The results show that DyCLINK detects not only code relatives, but also code clones that the state-of-the-art system is unable to identify. In a code classification problem, DyCLINK achieved 96% precision on average compared with the competitor’s 61%.
منابع مشابه
Scalable Detection of Similar Code : Techniques and Applications
Similar code, also known as cloned code, commonly exists in large software. Studies show that code duplication can incur higher software maintenance cost and more software defects. Thus, detecting similar code and tracking its migration have many important applications, including program understanding, refactoring, optimization, and bug detection. This dissertation presents novel, general techn...
متن کاملA Clone Detection Approach for a Collection of Similar Large-Scale Software Products
Reusing existing software with or without modifications is frequently occurred to develop new large software at low cost with high quality. So far, many techniques and tools have been proposed for detecting reused pieces in source code. However, existing tools have low scalability; they spend lots of memory and time to detect reused pieces on large-scale software. In this paper, we proposed an ...
متن کاملSimilar Code Detection and Elimination for Erlang Programs
A well-known bad code smell in refactoring and software maintenance is duplicated code, that is the existence of code clones, which are code fragments that are identical or similar to one another. Unjustified code clones increase code size, make maintenance and comprehension more difficult, and also indicate design problems such as a lack of encapsulation or abstraction. This paper describes an...
متن کاملExtracting Source Level Program Similarities from Dynamic Behavior
The vast majority of work on comparing program similarities to detect software piracy either assumes the availability of the program source code (e.g., Moss) or performs a complicated source program transformation to embed carefully designed signatures, or software watermarks, into the binary code. In this paper, we propose a new approach to detecting program similarities that requires neither ...
متن کاملUsing Code Instrumentation for Debugging and Constraint Checking
The members of the Committee appointed to examine the thesis of FILARET ILAS find it satisfactory and recommend that it be accepted. ii ACKNOWLEDGMENTS This work would not have been possible without the support and encouragement of my advisor Dr. Orest Pilskalns under whose supervision I chose this topic and began this thesis. I would like to thank Dr. Scott Wallace for his guidance during this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015